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We present herein a scheme by which to accurately evaluate the error exponents of a lossy 
data compression problem, which characterize average probabilities over a code ensemble 
of compression failure and success above or below a critical compression rate, respectively, 
utilizing the replica method (RM) . Although the existing method used in information theory 
(IT) is, in practice, limited to ensembles of randomly constructed codes, the proposed RM- 
based approach can be applied to a wider class of ensembles. This approach reproduces the 
optimal expressions of the error exponents achieved by the random code ensembles, which 
are known in IT. In addition, the proposed framework is used to show that codes composed 
of non-monotonic perceptrons of a specific type can provide the optimal exponents in most 
cases, which is supported by numerical experiments. 
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1. Introduction 

Recent research activities in the cross-disciplinary field that combines information the- 
ory (IT) and statistical mechanics (SM) have shown that the typical performance of various 
codes, such as error correction and compression codes, can be characterized as phase tran- 
sitions between several phases representing the success or failure of coding when the length 
of messages M becomes infinite. 1 However, for finite M, probabilities of coding failure in 
the success phase and coding success in the failure phase do not vanish, and therefore, it is 
interesting to estimate the probabilities that those events occur. 

For a reasonable code ensemble, the averages of those probabilities over the ensemble 
asymptotically scale with respect to M as exp[— Ma]. Here, a(> 0) which characterizes the 
asymptotic behavior, is often termed the error exponent. The evaluation of a is theoretically 
interesting and is of practical importance in the sense that the error exponents can be useful 
as one criterion in the case of assessing the coding performance for finite M. 

More recently, it has been shown that the replica method (RM) developed in SM can 
be used for accurate assessment of such an exponent for error correcting codes. 2 ' 3 Never- 
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theless, the proposed method relies on specific properties of error correcting codes, and the 
development of such techniques for other codes requires further investigation. Therefore, we 
herein provide a scheme by which to accurately evaluate the error exponents for lossy data 
compression problems of memoryless sources utilizing RM. The existing method used in IT 
has provided the optimal expressions of the error exponents. 4 ' 5 However, a precise assessment 
by the IT approach is, in practice, possible only for the ensembles of randomly constructed 
codes that exhibit optimal performance. In contrast, our SM-based approach can accurately 
evaluate the coding performance for a wider class of ensembles. 

The paper is organized as follows. In the next section, we briefly review the concept of 
lossy data compression and the definition of the error exponents. In §3, a statistical mechanical 
approach for the assessment of the error exponents is introduced. In §4, this approach is applied 
to the random code ensemble (RCE). Although the exponents evaluated here characterize the 
asymptotic behavior of the average probabilities over the ensemble, this analysis successfully 
reproduces the known optimal exponents in IT literature by selecting the best code in that 
ensemble. We briefly discuss RM-based evaluation of the exponents of the best code, which 
reproduces a result that is identical to the analysis for the average case. In addition to being 
consistent with the existing IT results, a major advantage of the proposed method is the ability 
to accurately assess the exponents for suboptimal ensembles. This is demonstrated for a simple 
lossy compression problem of a binary memoryless source in §5. For this source, the error 
exponents are evaluated for a suboptimal code ensemble, composed of perceptrons, of practical 
codebook size using the developed RM-based approach. The validity of the assessment is also 
confirmed numerically. The final section is devoted to a summary. 

2. Lossy Data Compression and Error Exponents 

In this section, we present the notation used herein and briefly review the concept of 
lossy data compression of memoryless sources. Let us focus on a discrete message consisting 
of M random variables y = (y 1 , y 2 , . . . , y M ) (y^ G Y = {0, 1, . . . , J — 1}), each component of 
which is assumed to be independently generated from an identical stationary distribution P = 
(P(0), P(l), . . . , P( J — 1)). Although the arguments below are for sources of discrete messages, 
the newly developed scheme can be directly extended to the case of continuous memoryless 
sources, in which the error exponents are expressed identically by replacing summations and 
distribution functions with integrals and density functions, respectively. 

The purpose of lossy data compression is to compress y into a binary expression s = 
(si, S2, ■ ■ ■ , sjv) (si G {0,1}), allowing a certain amount of distortion between the original 
message y and the representative vector y = (y 1 , y 2 , . . . , y M ) (y M G Y = {0, 1, . . . , L — 1}) 
when y is retrieved from s. We deal herein with the distortion of single-letter fidelity criterion 
d on Y x Y, which is defined as d(j, I) > {j G Y, I € Y) and mm le y{d(i, I)} = (Vj G Y). 
For example, the distortions for Boolean messages Y = Y = {0, 1} are frequently measured 



2/20 



J. Phys. Soc. Jpn. Full Paper 

using the Hamming distance, d(y, y) = YlfOLi [1 ~~ <V',j/ fl ] ^ 0' where 8 Xjy is 1 if x = y, and 
otherwise. 

A code C is specified by a map y(s;C) : s — > y, which is used in the restoration phase. 
This reasonably determines the compression scheme as 

s(y; C) = argmin{d(y, y(s; C))}, (1) 
s 

where argmin s {- • • } represents the argument s that minimizes • • • . When C is generated from 
a certain code ensemble, typical codes satisfy the fidelity criterion 

M 



l - mm{d(y , y (s; C)} = -L rnin <| £ V , C )) 



) < D, (2) 

for a given permissible distortion D and typical messages y with probability 1 in the limit 
M, N — ► oo keeping the coding rate R = N/M constant, if and only if R is larger than a 
certain critical rate R C (D). 

However, for finite M and N, any code has a finite probability Pf of breaking the fidelity 
(2) even for R > R C (D). Similarly, for R < R C (D), eq.(2) is satisfied with a certain probability 
P$. For reasonable code ensembles, the averages of these probabilities are expected to decay 
exponentially with respect to M when the message length M is sufficiently large. Therefore, 
the two error exponents ctA(D,R) = lmiM-»oo — hi (Pf)c f° r ^ > Rc{D) and otB{D,R) = 
limM^oo — In (Ps)c f° r ^ < Rc{D), where {■ ■ -)c represents the average over the code 
ensemble, can be used to characterize the potential ability of the ensembles of finite message 
lengths. The development of a framework for evaluating these exponents utilizing RM is the 
primary goal of this paper. 

3. Statistical Mechanical Approach to Error Exponents 

3.1 Free energy as a lower-bound of distortion 

Let us develop an analytical framework to assess the error exponents using RM. For 
this, we first regard the distortion function d(y, y(s; C)) as the Hamiltonian for the dynamical 
variable s, which also depends on predetermined variables y and C. In the compression process, 
the optimal sequence is chosen as eq. (1). As the original message and the code are generated 
from a stationary distribution P and the code ensemble, respectively, the resulting distortion 
(per bit) X(y,C) = min s {M _1 d(y, y(s; C))} is also expected to obey a certain distribution 
P(X,R). 

In the thermodynamic limit, P(X, R) is expected to peak at the typical value A = D t (R) 
and decay exponentially away from D t (R) as P(X,R) ~ exp[— Mh(X, R)]. This indicates that 
(Pp) c = f™P(X,R)dX ~ P{D,R) ~ exp[-Mh{p,R)\ for D > D t (R) (or R > R C {D)) and 
(P s ) c = J Q D P(A, R)dX ~ P{D,R) ~ exp[-Mh(D,R)] for D < D t (R) (or R < R C {D)). 
Therefore, we can express the error exponents a(D,R) using h(D,R) for both cases (Fig. 1). 
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Fig. 1. Schematic profile of the distribution P(X,R). D t (R) indicates the typical value of the dis- 
tortion M^ 1 mm s {d(y,y(s;C))}. As M, N — > oo keeping R = N/M fixed, the probability of 
compression failure (Pf)c for a given D{> D t {R)), which is represented as the shadow area (a), 
tends toward zero. Similarly, the probability of compression success (Ps)c for a given D(< D t (Rj) 
is illustrated in figure (b). Here, the error exponents are defined for characterization of the decay 
rates of these average probabilities. 



In order to assess the distribution P(X, R), we next utilize the inequality 

e -Mf3X(y,0 <^ e -«MM) = Z ((3;y,C) = e-WW-'M, (3) 
s 

which holds for any sets of (3 > 0, y and C. The physical implication of this is that the 
ground state energy X(y,C) (per component) is lower bounded by the free energy f((3;y,C) 
(per component) for an arbitrary temperature (3~ l > 0. In particular, f((3;y,C) agrees with 
X(y,C) in the zero temperature limit (3 — > oo. This means that we can evaluate P(X,R) by 
first assessing the distribution of f((3; y, C), P(f; 0), for general finite (3 > 0, and then taking 
the limit (3 — > oo afterward. Note that although most of the quantities appearing in this paper 
actually depend on the coding rate R, the dependency is not specified for some quantities 
such as P(f;(3),c(f,(3), and g(n,(3). 

3.2 Assessment of the error exponents from the moment of the partition function 

P(f; (3) is also expected to peak at its typical value 

St{fi) = ~ (In Z((3; y, C)) yfi = - lim ±- & ± In (Z"((3; y, C)) yfi , (4) 

and decay exponentially away from ft((3) as P{f;(3) ~ exp[— Mc(f, (3)] for large M. Here, we 
assume that c(f; (3) > is a convex downward function minimized to at / = ft- This implies 
that, for Vn G R, the moment of the partition function Z(f3;y,C), {Z n {(3;y,C))y c , can be 
evaluated by the saddle point method as 

(Z n ((3;y,C)) yjC « exp[-M{n/?/* + c(f* , (3)}], (5) 
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where (■ ■ - )yc denotes the average over y and C, and /* represents the value at the saddle 
point, which leads to the Legendre transformation 

ln(Z n ((3;y,C)) 1lC 

9(n, (3) = M V ' C = mm{nPf + c(/, /?)}. (6) 

Fig. 2 illustrates graphically the meaning of this evaluation. Given n, the minimization in 
eq.(6) provides a condition for determining the dominant / as 

(3 df ■ {7) 

For each [3, this can be solved pictorially by searching for the point on / at which the tangential 
slope of a function y = — [3~ l c(f, (3) agrees with n. Since (3 is positive, y = —f3~ l c{f,(3) is a 
convex upward function. This indicates that n = 0, n > and n < correspond to the typical 
values / = ft, f < ft and f > ft, respectively, which provides a useful clue for assessing the 
exponents. 

Based on eq.(6), the exponent c(f,[3) that characterizes the distribution of free energy 
P(f; 0) can be assessed by the inverse Legendre transformation 

c(f, (3) = max{-n/3/ + g(n, (8) 

n 

where max^j- • • } denotes the maximization of • • • with respect to x, from g{n,(3), which 
can be evaluated by using RM analytically extending expressions obtained for n G N to 
n G R if / is included in the support of P(f;(3), which we assume below. This enables the 
evaluation of the error exponent a(D, R), where D is assumed to be included in the support 
of P(\,R) throughout this paper, as a(D = f,R) = h{\ = f,R) = c(f,(3 — > oo) taking the 
zero-temperature limit (3 — > oo. The extremum with respect to n in eq.(8) is characterized by 
the condition 

i« = /, (9) 

for a given /, indicating that the exponent a^ B y(D,R), which is an abbreviation denoting 
ua(D,R) and o>b{D,R) for R > R C (D) and R < R C (D), respectively, can be assessed as 

a {AB} (D,R) = \\mc(f = D,(3) 

= li m (-^ +5 („,M, (10) 
/3^oo i on J 

where n in eq.(10) is a function of (3 that is determined by the condition 

ldg^) = 

(3 dn y J 

Equations (10) and (11) constitute the basis of our approach. 

It is necessary to mention two points here. First, a^-D, R) is evaluated for R > R C (D), 
or D > D t (R) for fixed R, where the typical distortion D t (R) can be evaluated as D t (R) = 
lim^oo ft(/3). Since Fig. 2 indicates that f > ft corresponds to n < 0, n determined from 
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n=0 




Fig. 2. Graphical scheme used to solve eq.(7), enabling the Legendre transformation of eq. (6) to be 
performed. 

eq.(ll) becomes negative in the assessment of cia(D,R). Similarly, n > is obtained for 
(Xb(D,R). Second, we assume that c(f,(3) is a convex downward function of / for V/3, which 
may not hold in certain situations. In such cases, evaluation based on eqs. (10) and (11) 
provides the lower bounds of the error exponents due to the nature of the Legendre transfor- 
mation. 

4. Application to the Random Code Ensemble 

4-1 The random code ensemble 

In order to show that the assessment of the error exponents based on eqs. (10) and (11) is 
consistent with the existing results, we first apply this method to the random code ensemble 
(RCE), which has been reported extensively in IT literature. 6 ' 7 

The RCE is an ensemble that is characterized by the component- wise random construction 
of a map y(s;C) from s to representative sequences y following an identical distribution 
Q = (Q(0),Q(1),...,Q(L-1)), as 

Prob{y»(s;C) = l} = Q(l), (12) 

where Q(l) > (/ = 0,1,..., L — 1) and ^ 6 yQ(0 = 1- The correspondence between s 
and y(s), termed a codebook, is known to both the compressor and the decompressor. The 
size of the codebook of the RCE grows as 0(M x 2^), which makes compressing a given 
message computationally difficult when the message lengths N and M are large because, 
other than looking up the codebook, no compression method exists. This prevents the RCE 
from being practical. However, this ensemble exhibits optimal compression performance when 
appropriately tuned, and so analysis of the RCE is important for clarifying the theoretical 
limitations of the framework of lossy data compression. 
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Let us evaluate g(n, (5) for RCE utilizing RM in order to assess eqs.(10) and (11). For this, 
we insert an identity 1 = E ?oi ,=i,o lL>6 5 ($s*,s>> ~ Qab) (a,b = 1,2, ... ,n) into Z n (fi; y, C) for 
neN and take the averages over y and C, which yields 

I „ \ M 



q(n, 3) = extr < 
{g ab e{o,i}} 



In 

M 



Tr « 



\a=l 



I yi{S 1 )i{S ,2 ) T ..,y{S n ) 
\\ 8 {5s*,s b ~ Qab) 



a>b 



(13) 



where the summation ^ b =oi * s replaced with the extremization ex ^ T {q ab e{0,i}}: which is 
valid for M — > oo, and (• • • )yy(si)y(s 2 ) y (s n ) denotes the averages over the distributions P and 
{Q(y(s a ))} (o = l,2,...,n). 

In order to utilize this expression for real (and, more generally, complex) n, we first 
employ the simplest replica symmetric (RS) ansatz q a i> = q (a > 6=1,2,... , n). The value of 
q is limited to only or 1 in the current system, yielding two RS solutions: 



9RSi(n,P) = - In 



i& [ lev 



ni?ln2, 



and 



9KS2{n,f3) = - In 



J>(j)£Q(Z)e-W) 
j&Y lev 
which correspond to q = and 1, respectively. 



12 In 2, 



(14) 



(15) 



4-3 Critical conditions and the frozen replica symmetry breaking solution 

We now have two RS solutions: (14) and (15). These solutions, however, become invalid 
unless both of the following two conditions are satisfied, which signals the breakdown of the 
RS ansatz. 

The first condition is regarding the local stability of the RS saddle point with respect to 
the infinitesimal disturbance for breaking the replica symmetry in order parameters, which is 
often termed the de Almeida-Thouless (AT) condition. 8 However, such a disturbance is not 
allowed in the current system, since the order parameters q a b = 5s a ,s b are discrete. Therefore, 
we expect that the AT stability is always satisfied for both of the solutions for the RCE, 
although the stability must be examined for other code ensembles. 

The other condition is regarding the entropy of the dynamical variable s. Equations (3) 

and (6) indicate that the equality 

dg(n,0) 0dg(n,P) 

s(n,P) = 1 — — 

on n op 
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1 (z^y,C){lnZ^y,C)- ^ Z ^ c n ) 
= — (16) 

holds for any pairs of n and (5 > 0. Since In Z(j3;y,C) — p dhiZ Q^ ,C ^ represents the entropy 
of the discrete dynamical variable s given y and C, eq.(16) must become non-negative as long 
as g{n,/3) is correctly evaluated. 

Substituting eq.(15) into eq.(16) yields s(n, 0) = 0, which indicates that g R s2{ n ,P) always 
critically satisfies this entropy condition. However, for g RS i(n,P), - d ^si(n,P) + @_ dg RS i(n,f3) 
generally vanishes at a certain critical value [3 = (3 C , signaling the breakdown of the replica 
symmetry when (3 is increased. In such cases, one promising method for obtaining the correct 
solution is to employ the 1-step (frozen) replica symmetry breaking (1RSB) ansatz, partitioning 
the replicated systems into ^ subgroups of identical size m and assuming that q a b = 1 if a and 
b belong to the same subgroup, and otherwise. 9 Extremizing (Z n (j3;y,C))y C with respect 
to m yields the 1RSB solution for (3 > (3 C as 

3irsbK/3) =extr jfifRsi (^"V 3 )} = 9RSi(n*, /3*) (17) 
where n* and f3* are assessed from the coupled equations 

n*p* = np, (18) 

d 9 R S i(n*,(3*) | P* dg RS1 (n*,P*) _ Q 
dn* n* d(3* 

which guarantees that s(n,f3) is non-negative (zero) for <?irsb(^, /?)• 

Equations (18) and (19) indicate that eq.(ll) for gmsB(n, f3) is reduced to a condition of 
9RSi(n,f3) as 

l dg 1RS B(n,P) _ 1 dg RS1 (n* , (3*) _ 

(3 dn (3* dn* 1 ) 

Equations (17), (19) and (20) indicate that the error exponents can be practically evaluated 
without using the 1RSB solution as 

a{A,B}(D,R) = hm<-n — h gi R SB{n, P) > 

dgRsi(n,/3) , . . . 

= -n ^ +gRSi(n,P), (21) 



where n and f3 are determined by 

1 dg RS1 (n,P) 
(3 dn 



D, (22) 



d9RSi(n,P) , P dg RS i(n, (3) _ n 

d~n + n dp " U ' [26) 
when g R si(n,P) is selected as the relevant solution, despite the fact that g R si{n,P) becomes 
invalid for P — > oo. 
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4-4 Assessment of the error exponents 

We are now ready to evaluate (*{a,b}(Dj R) for the RCE using the two RS solutions. We 
first consider the failure exponent a A (D,R) for R > R C (D). 

44-1 a A (D,R) 

In order to assess this exponent, we must select the relevant solution from <7rsi(™, (3) and 
gnS2( n , P)- Note that gns2 (n, (3) must not be relevant for n < because this solution does not 
satisfy the trivial identity 

1 1 

(24) 



lim o(ra, P) = - lim ±- In <Z»(/3; y, C)> = ~ In (1) = 0, 

n— »0 n— >0 M M 



and therefore the analytic continuation of this solution from n G N to n < is not reliable. 
ayi(Z),i?) corresponds to n < 0, and therefore we adopt 5rsi(^,/?) for the evaluation of 
a A {D,R). 

Inserting (14) into eqs.(22) and (23) yields 



jeY 



-pd{j,l) 



l&Y 



+ J Rln2 + /3L> = 0, 



(25) 
(26) 



Y,U 1 (j)Y^Vi(l\j)d(j,l)=D, 

jer i eY 

where the probability distributions J7i = (Lq(0), lq(l), . . . , Lq(J-l)) and V"i = {Vi(/|j)} (j G 
y, / G y) are defined as 

^(j){E ie yQ(0e-^} n 



^i(j) = 



Q(l)e 



-f3d(j,l) 



ZieYQV)e- pd{j > iy 



(27) 



(28) 



Inserting eqs.(22), (14) and (25) into eq.(21) yields the following expression for the error 
exponent 

'Uiur 



a A {D,R) = Y J U 1 (j)\n 
jeY 



= tfL(£/i||P), 



(29) 



I P(J) J 

where KL{-\\-) is termed the Kullback-Leibler divergence. 10 

Equation (29) characterizes the average performance of the RCE specified by Q. There- 
fore, the performance can be improved by maximizing eq.(29) with respect to Q under the 

constraint J2i Q(0 = 1 an d Q(0 > 0; which reduces to 

e -Pd{j,l) 



= i Q{i) = j2 u^mm, w e y. 



(30) 



The set of ra, (5 and Q that optimizes the exponent given D and R can be searched by 
the following scheme, which is often termed the Arimoto-Blahut algorithm (ABA). 10 ' 12,13 We 
begin with initial conditions of n(< 0), /3(> 0) and Q. Keeping Q fixed, n and (3 are first 
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updated by solving eqs.(25) and (26) with respect to these variables, which yields U± and 
V± using eqs.(27) and (28). Next, Q is updated from eq.(30) using the obtained U\ and V±. 
These procedures are iterated until n, (3 and Q converge, which is guaranteed by the convexity 
of the mutual information. 10 Then, the optimized exponent for the given D and R is obtained 
by substituting the convergent solution into eq.(29). 

In practice, it is much more convenient to deal with n and (3 as control parameters, rather 
than D and R, for which D and R are easily obtained from eqs.(25) and (26), after solving Q 
for the given n and (5 by simply iterating eqs.(27), (28) and (30). Inserting the optimal U±, 
which is given by the solved Q via eq.(27), into eq.(29) and varying n and f3, the o:a(D,R) 
surface is swept out. 



Next, we turn to the success exponent ctB{D,R) for R < R C {D). Since we expect that 
g(n,j3) is analytic, except for a few possible singular points of n, gRsi(n,(3) is likely to be 
relevant for n > as well, at least in the vicinity of n = 0, because this solution is supposed 
to be relevant for n < 0. Then, the exponent is obtained as ctB(D,R) = KL(Ui\\P), which 
is similar to (xa(D,R). 

However, the validity of this expression in the present case must be examined because 
gRS2(n,(3) can be relevant for n > 0. For this, we illustrate schematic profiles of gRsi(n,(3) 
and (/rs2 (n,(3) for a fixed j3 in Fig. 3. 

Equations (14) and (15) indicate that both gRsi(n,(3) and <7rs2("-, P), which intersect each 
other at n = 1 for V/3 > 0, are convex upward with respect to n. As a function of n, gRS2(n, (3) 
increases monotonically. Although the first derivative of <7rsi { n , 0) can be both positive and 
negative, in accordance with eq.(22), only the region of positive slope need to be considered. 

For n £ N, the relevant solution of g(n, (3) can be chosen by selecting one of the lower values 
of the two RS solutions, following the criterion of the conventional saddle point method. For 
n ^ N, RM relies on the assumption that an analytical expression of g(n, 0) that is relevant for 
a certain natural number k is also relevant in the vicinity of k, unless the analyticity is lost. 14 
Since <?rsi("-, 0) = gRS2{n, 0) holds at n = 1, this implies that the selection of <?rsi (^, /?) for 
n ^ 0, which we tentatively adopted assuming that the analyticity of g(n, (3) is not broken 
between n < and n ^ 0, is valid if 



44.2 a B {D,R) 



dgRS2(n,(3) dg RS i{n,(3) 




\n=l n=l 

1 ZjeY P(j} EieY Q(Qe- W) (- W, 0) 
1 Ejev m E/ g y QWe-WV ME ie y Q(l)e~ W) } 
dgRsi(n,P) (3dg RS1 (n,(3)\ 



+ i?ln2 



(31) 
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(a) (b) 

Fig. 3. Schematic profiles of 3rsi(h, (3) and gRS2(n, /3) for a fixed (3. The two functions intersect at 
n = 1 and both functions are convex upward with respect to n. Whereas the first derivative 
of gRS2(n,(3) is always positive, that of g^si depends on R, D and n. RM assesses the value of 
g{n,[3) for n £ N by analytically continuing the evaluation for n e N. This implies that the 
relevant solution for n < 1 is the smaller slope at n = 1 between gRsi(n,/3) and <7rs2 unless 
the analyticity is broken. Thus, the relevant solution is gRsi(n, (3) and <7rs2(w, (3) for the cases of 
(a) and (b), respectively. 



holds, which corresponds to the situation illustrated in Fig. 3 (a). 

Let us denote the solution of eqs.(22) and (23) as n = n c and (3 = (3 C , respec- 
tively. As (- d9ns gj l n ' f3) + g dffRsiKlA = o holdg) eq .(31) validates the selection of 

V / n=n c ,/3— p c 

<7rsi(w, f3), which yields the expression as(D,R) = KL(Ui\\P) if n c < 1 is obtained, because 
— 99rS q^'^ + ^ 9g RS i(n,/3) . g SU pp 0gec j ^ p OS itive for n > n c under the RS ansatz, which 
implies that eq.(31) holds. However, if n c > 1, eq.(31) does not hold, indicating the situation 
illustrated in Fig. 3 (b). In such a case, g , Rsi(w, f3) is no longer relevant for n = n c and (3 = (3 C , 
and therefore we have to amend the solution using gns2(n, (3). 
For 5RS2(n,/3), eq.(ll) is given as 

^^ 2 (Z|j)t/ 2 (jMj,0=A (32) 

where distributions U 2 = (U 2 (0), U 2 (l), U 2 (J - 1)) and V 2 = {V 2 (l\j)} (j eY,leY) are 
denned as 

p {j) J2ieY Q(l)e-P' dW) 

U2b) = (33> 

respectively, where (3' = n[3. 
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Note that the value of /3' determined from eq.(32) is kept invariant when (5 tends toward 
infinity. In order to assess eq.(10) for <?rs2(«, /?), inserting eq.(32) yields the expression 

a B {D,R) = KL(U 2 \\P)+I - R\n2, (35) 

where / is defined as 

In summary, the exponent olb{D, R) for R < R C (D) is expressed as 



(36) 



a B{D ,R)4 KL ^ P) - if0< "' <1 (37) 

[ KL(U 2 \\P) + I-Rln2, ifn c >l 

for a given ensemble specified by Q. 

Here, as(D, R) can be minimized with respect to the distribution Q in a manner similar 
to that for a A (D, R). Namely, we tentatively adopt the first expression of eq.(37), assuming 
that g , Rsi(^, P) is relevant, and employ ABA in order to obtain the optimal n, (3 and Q. 
If the obtained solution of n, n c , is smaller than 1, then the obtained expression is appro- 
priate. Otherwise, we have to amend the solution using the second expression, which can 
be optimized by ABA as well. In this case, the convergent solution satisfies the relation 
Q(l) = J2jeY ^2(l\j)U 2 (j) for V7 G Y. This yields the expression of the optimized exponent as 

a B (D,R) = KL(U 2 \\P) + (R(U 2 ,D) - R)ln2, (38) 

where 

V(l\j) 



R{U,D)= min £ V(l\j)U(j) log 2 

Y, jeY ,ieY v(i\i)u(j)d(j,i)<D j£Y,l£Y 



Z jeY v(iWU) 



(39) 



is termed the rate- distortion function, which represents the theoretically achievable limit of 
the compression rate for the information source U when distortion up to D is allowed in the 
limit N,M -> oo. 11 

4-5 Consistency with the IT literature 

We obtained two expressions for the error exponents (29) and (37) using RM. In order to 
validate our results, we check for consistency with results in the IT literature. 

4.5.1 a A (D,R) 

We first examine cxa(D, R) for R > R C {D). In the IT literature, the exponent for the best 
code is provided 4 as 

<** A {D,R)= min KL{U\\P). (40) 

U:R<R(U,D) 

This minimization problem can be solved by the method of Lagrange multipliers. Intro- 
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ducing auxiliary variables z±(> 0) and z 2 {< 0) as 

3& lev 



[ Q(i) 



R\n2, 



i& 'lev 



where 



Q(i) = J2m\j)uu), 

j€Y 

eq.(40) is converted to the minimization problem of 

'U(jY 



Full Paper 

(41) 
(42) 

(43) 



J A (U,V, Zl ,z 2 ) = ^2u(j)\n 
jeY 



[P(j) 



+riA 



+Pa{ 



^^y(Z|j)C/(i)ln 



nm 

Qii) J 



22 In 2 - zi 



^2J2 V ^ U ^)dU,l)-D-z 2 



jeY 



leY 



jeY 



(44) 



with respect to U, V, zi, and z 2 , where n A , Pa, (j = 1, 2, . . . , J — 1), and £ are Lagrange 
multipliers. 

Note that if the minimum is achieved by an internal point z\ > 0, OJa/Qzi = —tia = 0; 
otherwise, the minimum is placed on the boundary z\ = and dJ A /dz\ = —ua > 0. Since the 
rate-distortion function R(U,D) decreases monotonically as D increases 10 and R > R C (D) is 
assumed, we cannot set U = P. Furthermore, taking the convexity of the KL divergence into 
account, minimization (40) must be achieved on the boundary, which ensures that ua < (z± = 
0). A similar argument holds for Pa- According to the convexity of the mutual information, 
the rate-distortion function R(U, D) is determined by the distribution V on its boundary, 
which indicates (3 A > (z 2 = 0). 

Minimizing eq.(44) with respect to V{l\j) provides 

Q(l) e -pAd( 3 \l) 



V(l\j) = 



VjeY,leY, 



(45) 



where the normalization constraint has been already factored into the equation. Similarly, 
minimization with respect to U(j) yields 

PU){Z leY Q(l)e-^ l T A 



U(j) = 



Vj € Y. 



(46) 



In practice, we can assess the optimal exponents using ABA to solve eqs.(46),(45) and 
(43) with respect to ua < 0, A > and Q under the constraint that the solutions should be 
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found on the boundary, which is represented as eqs.(41) and (42) {z\ =^2 = 0). Identifying ha 
and (3a with n and /3, respectively, this is nothing more than the information presented in the 
preceding section for obtaining the error exponent optimized with respect to the distribution 
Q. Therefore, our SM-based framework is consistent with the result for a A (D, R) reported in 
the IT literature. 



4.5.2 a B (D,R) 

We next consider ocb{D, R) for R < R C (D). In the IT literature, the exponent for the best 
code for R < R C (D) is given 5 as 



a* B (D, R) = min KL{U\\P) + \R(U, D) — R\ + In 2, 

where |x| + = x for x > 0, and is otherwise. This can be separately expressed as 

min KL(U\\P), 

U:R>R(U,D) 

min KL(U\\P) + (R(U,D)-R)]n.2. 

, U:R<R(U,D) 



mm < 



(47) 

(48) 
(49) 



As well as eq.(40), eq. (48) is converted to the minimization of 

'U(j)' 



Jbi(U, V, zi, z 2 ) 



[P(j) 



V{l\j) 

L Qii) J 



R\n2 - zi 



{j£Y i eY 



jeY 



\ - 




> 




, _ 






(50) 



with respect to U,V,z\, and z 2 . Although constraints R < R C (D) and z\ < are different 
from those for a* A (D,R), the minimum is also achieved on the boundary in this case, which 
indicates n#i > {z\ = 0,dJsi/dzi = —ubi < 0). This means that the distribution U and 
the conditional distribution V can be represented as 

P(j){J2 le YQ(^ Bldm Y 



U(j) = 
V(l\j) = 



T. J& n3){T. l& YQ^ Bld{3 ' l) } 

Q(l) e ~pBid(j,l) 



V j € Y, 



VjeY,leY, 



(51) 
(52) 



using the Lagrange multipliers ubi and (3bi- 
Minimization (49) can be rewritten as 

min KL(U\\P) + 1 - Rln2, (53) 

Uy-.YljeY E i6 y V(l\j)U(j)d(j,l)<D, />Rln2 

where / is the mutual information expressed in eq.(36). This can also be solved by the method 
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of Lagrange multipliers, which yields the minimization of 



J B2 (U,V, Zl ,z 2 ) = ^L7(i)ln 

jeY 



P(J) 



3& lev 



V(l\j) 

L Q(i) 



i?ln2 



JeYi e y 



vim 

L Q(0 



flln2 - jzri 



> 



+ E "0") 



(54) 



jeY 



where 1132, Pb2, {y(j)} an d £ are Lagrange multipliers, and z\ (> 0) and Z2 (< 0) are defined 
in eqs.(41) and (42). Note that n B2 < and (5 B2 > 0, because #L(J7||P) + / - i?ln2 is not 
a convex function and we cannot exclude the possibility that n B 2 = 0. 

If the minimization (48) (or (50) ) is achieved for < n B i < 1, the minimization (49) 
( or (54) ) is achieved by the same U and V, by setting n B 2 = n B \ — 1 (< 0). However, if 
n B \ > 1, no distributions that minimize (48) can simultaneously be the solution of eq.(49), 
which indicates that eq.(49) is achieved by a distribution U that satisfies R < R(U,D). In 
this case, n B 2 must be zero, and therefore eq.(54) is reduced to 



Jm{U,V,z l ,z 2 ) 



E^') ln 

jeY 

+ EE^'Mj)m 

j&leY 



V(l\j) 

Q(0 J 



R]n2 



+ fo2 \ E E VmUU)d(j, l)-D-z 2 

+E^') I E^') - 1 [ +m E - 1 



j& 



Differentiating eq.(55) with respect to V(l\j) and U(j) yields 

Q(iy-p B 2d(j,l) 



V{l\j) = 



U(j) 



-P B 2d(J,l) ' 



^(j)E* g yQ(Qe-fe^) 



-P B2 d(j,l) 



(55) 



(56) 



(57) 



Based on the above argument, the optimal exponent a* B (D, R) is assessed by the following 
procedure. First, we employ ABA for the solution of minimization (48) with respect to n B \ > 
0, P B i > and Q using eqs.(51), (52) and (43) under the constraints (41) and (42) (z± = 
Z2 = 0). If the solved n B \ satisfies < n B \ < 1, it is guaranteed that the obtained U and V 
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achieve the minimization (47). However, if the obtained ubi is greater than 1, this solution is 
not appropriate, because minimization (49) is not achieved. Therefore, we have to search for 
another solution using eqs.(57), (56) and (43) with (3b2 > 0, under the constraint (42) [zi = 0), 
which can also be performed by ABA. In this case, the other constraint (41) {z\ > 0) is always 
satisfied for ubi > 1, which is confirmed by the fact that the minimization of 



can be achieved by an internal point with respect to z\ if and only if < ngi < 1. 

This procedure is identical to that of the RM-based approach presented in a previous 
section. Therefore, the framework developed in this paper is consistent with the result for 
a* B (D,R) reported in the IT literature. 

4-6 Discussion 

Here, two points are worth noting. First, we have shown that the exponents assessed by 
the RM-based method become identical to those of the best code in the IT literature, when 
optimized with respect to the code ensemble. However, this may be somewhat curious because 
a {A,B}(D, R) characterizes either the average of the compression failure or the success prob- 
ability over a code ensemble, which implies that cx{a,b}(D,R) does not necessarily coincide 
with the exponent of the best code, even if the ensemble is optimized. In order to examine a 
possible difference in exponents between the average and optimal probabilities, we evaluated 
the exponents of the minimum failure probability P£ = lim t ^_ 00 (P^(C, £>))J/* for R > R C {D) 
and the maximum success probability P s * = lim^ +00 (Pj(C, D)) 1 ^ for R < R C (D) for fixed 
ensembles, which reduced to the current calculations for the average probabilities. This means 
that in the RCE specified by Q, the performance of the best code is identical to that of typical 
codes in terms of the exponents, although differences may exist for ensembles of other types. 
Second, we may be able to apply the present framework to sources with memory, for which 
the optimal exponents have not been reported in the IT literature. This possibility is currently 
under investigation. 

5. Application to a Sub-optimal Ensemble 

In addition to consistency with the existing results, a major advantage of the proposed 
RM-based approach is its ability to accurately evaluate the exponents for a wider class of 
ensembles. Here, we demonstrate this ability for a lossy compression of a binary memoryless 
source, which is specified by P = (P(0), P(l)) = (1 — p,p) where < p < 1/2. 

Although RCEs exhibit the optimal performance, they are difficult to implement in prac- 
tice because a storage of 0(Mx2 N ) is required in order to express the set of representative 
vectors y(s). As a candidate to resolve this difficulty, we investigate the performance of a 
compression scheme which utilizes perceptrons having random connections. 15 



U:R>R(U,D) 



mm 



KL{U\\P) + (R(U,D) -R)]n2 



(58) 
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More specifically, we define a map from the compressed expression s £ {+1, — 1}^ to the 
representative sequence y(s) G {0, 1} M as 

r(s) = f(-±=s-xA, ( M = 1,2,..., M) (59) 

for the specification of a code, where /(•) is a function for which the output is limited to 
{0, 1} and X ^ =1 ' 2, '" ,M are randomly predetermined iV-dimensional vectors generated from an 
iV-dimensional normal distribution P(x) = (V2tt) N exp [— |tc| 2 /2] . These vectors are known 
to the compressor and the decompressor, which act as the codebook. Here, for convenience, we 
introduce the alphabet {+1, —1}, rather than the conventional alphabet {0, 1} with respect 
to the compressed sequence s. 

We employ the Hamming distortion d(y, y(s)) = Ylw=i [1 ~~ 'Vsp] to measure the fidelity 
of the representative sequences. Then, a lossy compression scheme can be defined on the basis 
of eq.(59) as follows: 

• Compression: For a given message y, find a vector s that minimizes the distortion 
d(y,y(s)), where y(s) is the representative vector that is uniquely generated from s by 
eq.(59). The obtained s is adopted as the compressed expression. 

• Decoding: Given the compressed expression s, the representative vector y(s) produced 
by eq.(59) yields an approximation of the original message. 

Random selection of the connections naturally defines a code ensemble of this scheme. 

Codes of this type may be preferred for practical implementation because the necessary 
storage cost is only 0(M x N). However, possible correlations between components of the 
representative vector may prevent the analysis of its performance by conventional methods 
in the IT literature. Nevertheless, the proposed RM-based approach makes it possible to 
accurately evaluate the performance of this ensemble using a recipe similar to the capacity 
analysis of perceptrons, which has been reported extensively over the last decade. 16 In a 
previous paper, 15 such an analysis indicated that a function f(u) = 1 for \u\ < k, and 
otherwise, which offers optimal performance in the limit M, N — > oo achieving the rate- 
distortion function of R(p,D) = H2(p) — H2(D) for this case, where H-2(x) = — xlog 2 (x) — 
(1 — x) log 2 (l — x) for < x < 1 when k is adjusted such that 2 dze~ z ' ' I 2 / ' \p2m = 1 ] ~j^^ p , 
where D* represents the lower bound of the Hamming distortion for a given compression rate 
R, which is obtained from the inverse function of the rate distortion function, except for a 
very narrow AT instability region in the vicinity of p = 0.5. 

The error exponents of this ensemble can be calculated by a procedure similar to that for 
the RCE. Taking the average of (j>2 s e -/3d (2/'^ s ^ with respect to the original message y 
and the connection vectors X fl=1 ' 2, ---' M yields 

g(n,/3) = 
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( f / „ \ . / r, \ M 




extr < 

{Qab\ 



x TV 

* 7 * a>b 

where g a & = g ° v s '' for a > 6 = 1, 2, . . . , n, 




l -v l Av + iv .„]( ]%-" + (! - e-")e fc («°; y) 



for w A* 

6 fc («;l) = l-6 fc («;0) = <{ *' ' ' - (61) 

otherwise, 

and A = (5 a b + (1 — 5 a b)q a b)- This expression corresponds to eq.(13) for the RCE. 

The mirror symmetry, f(—u) = f(u), of the transfer function yields a solution q a b = Q = 
under the RS ansatz, 15 which offers 

<?RSi (n, p) = - In [p { 1 - v + rye"/ 3 }" + (1 - p) { (1 - ^e"' 3 + r/}™] - nfl In 2, (62) 

where 77 is defined as r/ = 1 — 2 f£° dze~ z ' 'I 2 / '\f 7 2sk '. In addition, there exists another RS solution 

5RS2(n,/3) = - In [p {l - r? + r/e"' 3 ™} + (1 - p) {(1 - r^e"^ + 7/}] - 12 In 2, (63) 

corresponding to q a b = Q = 1- Equations (62) and (63) coincide with eqs.(14) and (15) for the 
current source and the Hamming distortion, respectively. 

Therefore, we can recycle the calculation for the RCE to examine the performance of 
the current ensemble, which indicates that the optimal error exponents can be obtained 
by adjusting the parameter k to the optimal value for each pair of D and R (such that 
2 dze'* 2 / 2 / = \ D 2 g* , where p* satisfies the relation R = H 2 (p*) - H 2 {D) for the 
rate R and the given permissible level D) unless AT instability occurs for the above RS so- 
lutions. Note that for the current source, the optimal exponents can always be achieved by 
9RSi( n ,(3)- Here, gRS2(n,(3) becomes dominant in only suboptimal cases for o>b(D,R) (Fig. 4 
(b) inset). 

In order to justify the above analysis, we performed numerical experiments implementing 
the proposed scheme. As an exhaustive search was performed for compression, the system size 
was limited to N = 20. Fig. 4 shows the exponents averaged over the results from 5 x 10 3 ~ 
1 x 10 6 experiments for the case of p = 0.2, R = 0.2. The white circles and triangles indicate 
data obtained by adjusting k so that the exponents are optimized for (a) D = 0.2 and (b) 
D = 0.0, respectively. The black circles and triangles indicate data obtained using k ~ 0.136 
so as to reproduce the rate-distortion relation, which implies that both exponents vanish at 
D* ~ 0.117. In Fig. 4(a), notice that the white symbols increase at D = 0.2 as grows, 
whereas the black symbols decrease, approaching each theoretical prediction consistently. 
Fig. 4 (b) shows that the white symbols are located below the black symbols at D = 0.0. In 
both figures, the discrepancies between the experimental data and the theoretical predictions 
are considered to be due to the finite size effect. 
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D D 

(a) (b) 



Fig. 4. Error exponents (a) oia{D,R) and (b) C(b{D,R) for p = 0.2, R = 0.2. The solid and dashed 
curves indicate the optimal exponents a*^ AB ^(D, R) and the exponents obtained for fixed fc(~ 
0.136) that realizes the rate-distortion relation R(p, D) — ^(p) — ^(-D), respectively. For < 
D < 0.011 in Fig. (b), the dashed curve is obtained from the solution gns2(n, (3), which dominates 
9RSi(n,P) in this region (Fig. (b) inset). The experimental data was obtained for (a) 5000-10000 
trials for N = 10, 20 and (b) 10 6 trials for N = 4, 10 through exhaustive search. The white circles 
and triangles represent the exponents optimized for (a) D = 0.2 and (b) D = 0.0, respectively, by 
adjusting k, and the black symbols indicate exponents for fixed fc(~ 0.136). 

6. Summary 

In summary, we have developed a scheme by which to assess the error exponents of a 
lossy data compression problem using RM. The newly developed RM-based approach for the 
exponents corresponding to the average failure or success probabilities for the random code 
ensembles reproduces the optimal error exponents achieved by selecting the best code reported 
in the IT literature, which indicates that the performance of the best code is identical to that 
of typical codes in terms of error exponents. Furthermore, the proposed framework makes an 
accurate assessment of the coding performance possible for a wide class of code ensembles. 
Using this characteristic, we have shown that a lossy compression scheme based on a specific 
type of non-monotonic perceptron provides the optimized exponents in most cases, which has 
been supported numerically. 

Evaluation of the error exponents of practical algorithms for lossy data compression is a 
subject for future study. In order to reduce the computational cost of the proposed coding 
scheme, the development of approximation algorithms by which to realize the compression 
phase using a perceptron is currently under way. 
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